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APPARATUS AND METHOD FOR USING A 
TARGET BASED COMPUTER VISION 
SYSTEM FOR USER INTERACTION 



BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention generally relates to a method and apparatus to 
recognize actions of a user and to have those actions correspond to specific 
computer functions. 

Description of the Related Art 

Many people find it difficult to use computers because of problems 
operating current input devices such as a keyboard or a mouse. This includes 
users with physical disabilities, young children and people who have never 
learned to use a computer but find themselves confronted with one at a public 
kiosk or similar workstation. Alternative input devices have been developed, such 
as touch screens and single-switch devices, but these have two distinct 
disadvantages. First, they rely on physical devices that often must be carefully 
setup for the user, and are prone to damage or vandalism in public places. 
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Second, they do not allow the full range of expression needed to effectively 
interact with current computer applications. 

It is therefore desirable to provide a means to allow people to use 
computer systems without the use of physical interface devices. Many methods 

5 have been explored to allow users to interact with machines by way of gestures 
and movements using cameras. Most of these methods, however, have one or 
more of several limitations. Either they are very limited in the type of gesture they 
can recognize, they require extensive customization for a specific user, they are 
not robust in the face of environmental conditions, they are not reliable or they 

10 require extensive user training. A robust, flexible and user friendly method and 

apparatus is needed to allow a computer to recognize a wide range of user actions 
using a camera. 

SUMMARY OF THE INVENTION 

In view of the foregoing and other problems of the conventional methods, 
1 5 it is, therefore, an object of the present invention to provide a structure and 

method for training a computer system to recognize specific actions of a user. 

The method may include displaying an image of a user within a window 
on a screen. The window may include a target area. The method may also 
include associating a first computer event with a first user action displayed in the 
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target area and storing information in a memory device such that the first user 
action is associated with the first computer event. 

A system is also provided that may associate specific user actions with 
specific computer commands. The system may include an image capture system 
5 that captures an image of a user. The system may also include an image display 
system that displays said image captured by the image capture system within a 
window on a display screen. The system may recognize the specific user actions 
and associate the specific user actions with the specific computer commands. 

Other objects, advantages and salient features of the invention will become 
10 apparent from the following detailed description taken in conjunction with the 
annexed drawings, which disclose preferred embodiments of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention will be described in detail with reference to the following 
drawings in which like reference numerals refer to like elements and wherein: 
1 5 Fig. 1 shows the image capture system and display device according to the 

present invention; 

Figs. 2A and 2B show two different positions of the user and the target 
area within a window on a display according to the present invention; 

Fig. 3 is a flowchart showing a preferred method of the present invention; 
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i 

Fig. 4 shows a hardware configuration of an information 
handling/computer system for operating the present invention; and 

Fig. 5 shows a magnetic data storage diskette that may store a program 
according to the present invention. 



5 DETAILED DESCRIPTION OF PREFERRED 

EMBODIMENTS OF THE INVENTION 

As will be described below in greater detail, the present system and 
method according to the present invention allows a user to define a target (or 
target region) in an image provided by a camera (i.e., an image capture system). 

1 0 The user is allowed to describe or demonstrate what it means to activate the target 

and to define a specific computer function to be associated with that activation 
(e.g., mouse click). Thereafter, whenever the system detects activation of the 
target, the computer system will perform the associated function. The system may 
activate the target by: (1) detecting a state, (2) detecting a motion or (3) detecting 

1 5 intersection between an object in the scene and an object. These ways of 

activating the target will be discussed below in greater detail. One skilled in the 
art would understand that these methods of activating the target are illustrative 
only and are not meant to be limiting. That is, other methods of activating the 
target are also within the scope of the present invention. 
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The present invention will now be described with reference to the 
accompanying figures. Fig. 1 shows a display device 10 having a display screen 
12. The display device 10 preferably is a conventional computer screen display 
and is connected to a conventional computer system as will be described below. 
5 An image capture system 20 may be positioned at any location such that it 

captures the image of the user. In one embodiment, the display device 10 and 
image capture system 20 may be provided in public kiosks. 

As shown in Fig. 1, the display screen 12 includes a window 30 that 
displays a picture of the image that is captured by the image capture system 20. 

10 The window 30 is shown in the lower left-hand comer of the display screen 12 but 
it is not limited to this location. Rather, the window 30 may be displayed at any 
appropriate location on the screen 12. The computer system will include the 
appropriate hardware and software components that will connect the image 
capture system 20 to the computer system such that the image captured by the 

15 image capture system 20 is displayed within the window 30. The display screen 

12 will also show a normal computer screen and will provide computer prompts in 
a normal manner. 

The computer system preferably includes a training phase that occurs 
before the system begins operation. The system operates based on 

20 software/hardware that will be connected to the system. In the training phase, the 
image capture system 20 captures an image of the user and displays the "reversed" 
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image within the window 30 on the display screen 12. The user sees himself in 
much the same way as he would see himself in a mirror. In the training phase, the 
computer system is programmed to produce a visible target area 32 within the 
window 30 such as shown in Figs. 2A and 2B. The target area 32 is preferably 
5 movable about the window 30 using a pointing device or speech recognition 

system. The system is also capable of locating another target area 32 within 
another area of the window 30. The computer system will be trained to associate 
specific user movements/actions (or gestures) with respective computer functions. 
The user preferably demonstrates or describes the action. Demonstration occurs, 

1 0 for example, by the user hitting a key when the action is being performed, or just 
before and just after the action is performed. Features relating to the 
movements/actions will be stored in a memory device so the system will 
recognize the specific movements/actions after the training phase and associate 
the movements/actions with the respective computer fimction. 

15 In the training phase, the image capture system 20 initially displays an 

image of the user within the window 30. Features of this image may be stored in 
a memory device in order to help in recognizing the fiiture movements/actions of 
the user. The target area 32 is displayed in the window 30 and may be positioned 
by the user or may be automatically positioned by the system. While the target 

20 area 32 is displayed within the window 30, one skilled in the art would understand 
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that the target area 32 corresponds to a specific area as seen by the image capture 
system 20. 

The target area 32 may be a rectangular shape, a square shape, a circle 
shape or any other type of shape. As discussed above, the user may locate or 
5 relocate the target area 32 about the window 30 using a pointing device or speech 
recognition system. The user may also manipulate the size (i.e., the dimensions) 
of the target area using respective input commands. The user may also define 
more than one target area, each with its associated function. 

The computer system preferably prompts the user to perform an action 

10 within the target area 32 that will be associated with a specific computer function. 

For example, in Fig. 2A, the user is shown as tilting his head to the left within the 
window 30. By tilting the head to within the target area 32, the computer system 
recognizes the action within the target area 32. For example, the system may use 
background averaging, template or color predicate methods to recognize when 

15 part of a user's body is within the "target". Other methods will be described 
below. The training image(s) or features extracted from the training image(s), 
such as a summary, may then be stored in a memory device so that the computer 
system may recognize this future movement. While Fig. 2A shows movement of 
the head, this movement/action is not to be limiting as a type of movement/action 

20 (gesture) of a user. Rather, the user may move a hand, another part of the user's 
body or another object within the target area 32. 
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The movement/action will then be associated with a specific computer 
function. For example, the user may be asked to associate that specific 
movement/action within the target area 32 as corresponding to a specific computer 
function. One preferred computer function is that of a mouse click or the 

5 movement of a pointing device on the screen (i.e., movement of a cursor to the 
left or right). The movement may also correspond to other computer functions 
and/or letters/symbols. The computer system associates the inputted computer 
function (i.e., a mouse click) as corresponding to the movement/action shown 
within the target area 32. 

10 In a subsequent step of the training phase, another target area 32 may be 

placed in a new location as shown in Fig. 2B. In this figure, the user performs a 
second movement/action such as tilting the head to the right as viewed in the 
window 30. The altered image of the user within the target area 32 may be stored 
in a memory device. The computer system may again ask for (or automatically 

1 5 provide) a respective computer function that will correspond to this movement. 

The association is stored in the memory device for future use. 

The above example describes that the user performs a movement/action 
and then the user provides a corresponding computer function that will be 
associated with that movement. However, the computer system may 

20 automatically provide a respective computer function based on either a first, 

second or third movement or may merely automatically associate different 
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movements with a specific computer function. What is important is that the 
movement of the user is associated with a respective computer function either 
automatically or based on the user requested function. 

This training phase continues so that the computer system will be trained 
5 to recognize different computer functions based on different movements/actions 

of the user. 

Once the user has appropriately trained the computer system to recognize 
his/her movements/actions (or gestures), the computer system may enter a normal 
operation phase while running a specific program/game/utility in which it will 

10 recognize the user's movements, such as a head tilt to the right or to the left and 
will associate that movement with the previously stored computer fixnction. The 
computer system accomplishes this by capturing the image of the user using the 
image capture system 20. The image of the user is preferably displayed on the 
display screen 12 so the user will see his/her own movements. The target area 32 

1 5 does not need to be displayed in this mode. The user may then perform normal 

computer operations in a similar manner to a person using a mouse or other 
pointing device because the computer has been trained to recognize specific 
functions based on the movements of the user. 

Fig. 3 is a flowchart showing the inventive methodology of the training 

20 phase according to the present invention. In step SI 00, a user image is displayed 
on the screen 12 within the window 30. Subsequently, the target area 32 is 
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displayed in the window 30 in step SI 04. The user image is ahered preferably by 
user movement in the target area 32 in step SI 06. Data from the training image(s) 
may be stored in step SI 08. The stored data is then associated with a specific 
computer function in step SllO. This association may be stored in memory in 
5 step S 11 2. Other movements/actions may be associated with additional functions 
by producing another target area 32 in step SI 16 and repeating steps S106-S1 14 
for that new target area 32. It is understood that while Fig. 3 shows a preferred 
method of the present invention, the number of steps and the order of the steps 
need not be the same as shown in Fig. 3. 
1 0 Figure 4 illustrates a typical hardware configuration of an information 

handling/computer system for operating the present invention. Such a system 
preferably has at least one processor or central processing unit (CPU) 300. The 
CPU 300 may be interconnected via a system bus 301 to a random access memory 
(RAM) 302, read-only memory (ROM) 303, input/output (I/O) adapter 304 (for 
1 5 connecting peripheral devices such as disk units 305 and tape drives 306 to the 
bus 301), communication adapter 307 (for connecting an information handling 
system to a data processing network), user interface adapter 308 (for connecting a 
keyboard 309, microphone 310, mouse 311, speaker 312, image capture system 20 
and/or other user interface device to the bus 301), and display adapter 313 (for 
20 connecting the bus 301 to a display device 10). As is understood to one skilled in 
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the art, the program that executes the above described method according to the 
present invention may be stored on a floppy disk or be stored within the memory. 

Such a method as described above may be implemented, for example, by 
operating a computer, as embodied by a digital data processing apparatus, to 
5 execute a sequence of machine-readable instructions. These instructions may 
reside in various types of signal-bearing media. 

Thus, the present invention may be directed to a programmed product, 
including signal-bearing media tangibly embodying a program of machine- 
readable instructions executable by a digital data processor. 
1 0 This signal-bearing media may include, for example, a random access 

memory (RAM) such as for example a fast-access storage contained within the 
computer. Alternatively, the instructions may be contained in another signal- 
bearing media, such as a magnetic storage diskette 900 shown exemplarily in 
Figure 5, directly or indirectly accessible by the computer. 
1 5 Whether contained in the diskette, the computer, or elsewhere, the 

instructions may be stored on a variety of machine-readable data storage media, 
such as DASD storage (e.g., a conventional "hard drive" or a RAID array), 
magnetic tape, electronic read-only memory (e.g., ROM, EPROM, or EEPROM), 
an optical storage device (e.g., CD-ROM, WORM, DVD, digital optical tape, 
20 etc.), paper "pimch" cards, or other suitable signal-bearing media including 
transmission media such as digital and analog and communication links and 
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wireless. In an illustrative embodiment of the invention, the machine-readable 
instructions may comprise software object code, compiled from a suitable 
language. 

One alternative to this invention is the use of a touch screen and various 
5 onscreen cues or targets. This has the drawback that it is less durable than a 

camera (that can be located safely behind a small hole in the display bezel or 

kiosk facing), and can only be used at close range. 

As discussed below, the present invention has at least three different 

methods to activate the target. These are: (1) detecting a state, (2) detecting a 
10 motion, or (3) detecting intersection between an object in the scene and an object. 

These methods will now be discussed in greater detail. Other methods of 

activating the target are also within the scope of the present invention as these 

methods are merely illustrative. 

The first method to activate the target is detecting a pattern of color. 
1 5 During the training phase, the state of the target 32 may be demonstrated to the 

system. A summary of this color pattern (i.e., a color histogram) may be saved. 

The color pattern of each subsequent image during the normal operational phase 

may be matched to the one from the training phase, and if it is sufficiently close, 

then the target will be considered activated. Alternatively, two states may be 
20 demonstrated during training (i.e., target empty and target full). The new patterns 

may be matched to each of those, and the system will decide which is closer. 
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Other types of states may also be statistical properties, such as average pixel 
intensity, shape properties such as number or pattern of lines, and possibly others. 

A second method of activating the target is detecting a motion. During 
training, the user may either describe or demonstrate a motion within the target 

5 32. This may be, for example, a left-to-right motion. The system may monitor 

the motion within the target 32 during subsequent images using standard 
techniques (e.g. optical flow, feature tracking, etc) and produce summaries of that 
motion. When a target motion matches the originally trained motion, then the 
target will be considered activated. To demonstrate the motion, during training 

10 the user may prime the system by some standard technique such as a menu item. 

Then, the user may perform the motion (e.g., wave their hand through the target) 
and then tell the system they were done. The system monitors the motion within 
the target during that time interval, and produces and saves a summary of that 
motion. The system then monitors and matches the motion summary as described 

15 above. 

A third method of activating the target may be by intersection detection. 
During training, the user may identify an object in the scene. The system may 
then track that object in subsequent images. When that object intersects the 
target, the target is considered activated. Object tracking can be done using a 
20 variety of existing techniques. One class of trackuig techniques involves 
identifying the object explicitly for the system. The system then extracts 
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information that uniquely identifies the object (i.e., color pattern, pixel template, 
etc). Subsequent images are searched for that object using color segmentation, 
template matching, etc. A second class involves showing the system a 
background image during training that does not contain the object. The system 

5 compares subsequent images to the background image to determine if the object is 

present. In either case, the system may identify the boundaries of the object being 
tracked and determines if it overlaps the target 32. 

As a further embodiment, the target region may be used for more than 
binary activation. That is, rather than being activated or not, the target may be 

10 determined to be in a number of states. Each of these states may have a different 
computer function associated with it. For example, the motion (e.g. a hand in the 
target) within the target may be summarized as an average motion vector. As one 
example, the cursor on the screen may then be moved by a direction and an 
amount derived by passing that vector through an appropriate transfer function. 

1 5 Further, if a user's hand (or other type of activating object) remains within the 

target region beyond a predetermined time, then the cursor on the screen would 
continue to move in a left direction, for example. Other variations are also 
apparent to one skilled in the art. 

While the invention has been described with reference to specific 

20 embodiments, the description of the specific embodiments is illustrative only and 
is not to be considered as limiting the scope of the invention. Various other 
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modifications and changes may occur to those skilled in the art without departing 
from the spirit and scope of the invention. 
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What is claimed is: 



1 1 . A method of enabling a computer system to recognize specific 

2 actions of a user, said method comprising: 

3 displaying an image of a user within a window on a screen, said 

4 window including a target area; 

5 associating a first computer event with a first user action displayed 

6 in said target area; and 

7 storing information in a memory device such that said first user 

8 action is associated with said first computer event. 

1 2. The method of claim 1 , wherein said association comprises said 

2 computer system detecting a change of state within said target area as said first 

3 user action. 

1 3 . The method of claim 2, wherein said change of state comprises a 

2 change of a pattern of color in said target area. 

1 4. The method of claim 3, wherein said association further comprises 

2 storing a summary of said color pattern of said target area. 
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1 5 . The method of claim 1 , wherein said association comprises said 

2 computer system detecting motion within said target area as said first user action. 

1 6. The method of claim 1 , wherein said association comprises said 

2 computer system detecting an object entering said target area as said first user 

3 action. 

1 7, The method of claim 1 , wherein said association comprises said 

2 computer system recognizing said first user action as a specific computer function 

3 to execute when said first user action occurs. 

1 8 . The method of claim 1 , wherein said target area is one of a 

2 rectangular area, a circular area and a square area. 

1 9. The method of claim 1 , wherein said association comprises 

2 associating a plurality of computer events while said first user action remains 

3 displayed in said target area. 

1 10. The method of claim 1 , wherein said association comprises 

2 associating a plurality of computer events with said first user action. 
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1 11. The method of claim 1 , further comprising positioning said target 

2 area within said window. 

1 12. The method of claim 1 1 , wherein positioning said target area 

2 comprises locating said target area within said window using user input 

3 commands. 

1 13. The method of claim 1 , further comprising: 

2 producing another target area within said window; 

3 associating a second computer event with a second user action 

4 displayed in said another target area; and 

5 storing information in said memory device such that said second 

6 user action is associated with said second computer event. 

1 14. The method of claim 1 , wherein said first computer event is a 

2 mouse click action. 

1 15. The method of claim 1 , wherein said first user action comprises a 

2 different image of said user within said target area. 
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1 16. A method of using a computer system having an image capture 

2 system that displays an image of a user on a display screen, said method 

3 comprising: 

4 enabling said computer system to recognize a specific user action 

5 as corresponding to a specific computer event; 

6 capturing said specific user action with said image capture system; 

7 and 

8 performing said specific computer event when said specific user 

9 action is captured by said image capture system, 

1 1 7. The method of claim 1 6, wherein said enabling comprises: 

2 displaying said image of said user within a window on said display 

3 screen, wherein a target area is located within said window; 

4 associating said specific user action with said specific computer 

5 event; and 

6 storing information in a memory device such that said specific user 

7 action is associated with said specific computer event, 

1 18. The method of claim 1 6, further comprising positioning said target 

2 area within said window. 
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1 1 9. The method of claim 1 6, further comprising: 

2 producing another target area within said window; 

3 associating a second computer event with a second user action 

4 displayed in said another target area; and 

5 storing information in said memory device such that said second 

6 user action is associated with said second user event. 

1 20. A system that associates specific user actions with specific 

2 computer commands, said system comprising: 

3 an image capture system that captures an image of a user; 

4 an image display system that displays said image captured by said 

5 image capture system within a window on a display screen; and 

6 a computer system that recognizes said specific user actions and 

7 associates said specific user actions with said specific computer commands. 

1 21 . The system of claim 20, wherein said computer system includes a 

2 training phase to train the computer system to recognize said specific user actions, 

3 said training phase comprising: 

4 displaying said image of said user that is captured by said image 

5 capture system within said window on said display screen, said window including 

6 a target area; 
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7 associating a first computer command with a first user action 

8 displayed in said target area; and 

9 storing information in a memory device such that said first user 
10 action is associated with said first computer command. 

1 22, The system of claim 2 1 , wherein said training phase further 

2 comprises: 

3 producing another target area within said window; 

4 associating a second computer command with a second user action 

5 displayed in said another target area; and 

6 storing information in said memory device such that said second 

7 user action is associated with said second computer command. 

1 23. A program storage device readable by machine, tangibly 

2 embodying a program of instructions executable by the machine to perform 

3 method steps for training a system to recognize specific user actions, said method 

4 steps comprising: 

5 displaying an image of a user within a window on a screen, said 

6 window including a target area; 

7 associating a first computer event with a first user action displayed 

8 in said target area; and 
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9 
10 



storing information in a memory device such that said first user 
action is associated with said first computer event. 



1 24. A method of enabhng a computer system to recognize specific 

2 actions of a user, said method comprising: 

3 associating a first computer event with a first action displayed on a 

4 display screen; and 

5 storing information in a memory device such that said first action 

6 is associated with said first computer event. 

1 25. The method of claim 24, fizrther comprising displaying an image of 

2 said user within a window on said screen, said window including a target area, 

3 and said association comprises associating said first computer event with said first 

4 action displayed on said screen. 
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APPARATUS AND METHOD FOR USING A 
TARGET BASED COMPUTER VISION 
SYSTEM FOR USER INTERACTION 

ABSTRACT OF THE DISCLOSURE 

5 A method and apparatus are provided for training a computer system to 

recognize specific actions of a user. This may include displaying an image of a 
user within a window on a screen. The window includes a target area. This may 
also include associating a first computer event with a first user action displayed in 
the target area and storing information in a memory device such that the first user 
1 0 action is associated with the first computer event. 
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